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Abstract 

The sensitivity of a Cherenkov imaging telescope is strongly dependent on the rejection 
of the cosmic -ray background events. The methods which have been used to achieve the 
segregation between the gamma-rays from the source and the background cosmic -rays, 
include methods like Supercuts/Dynamic Supercuts, Maximum likelihood classifier, Ker- 
nel methods, Fractals, Wavelets and random forest. While the segregation potential of the 
neural network classifier has been investigated in the past with modest results, the main 
purpose of this paper is to study the gamma / hadron segregation potential of various ANN 
algorithms, some of which are supposed to be more powerful in terms of better conver- 
gence and lower error compared to the commonly used Backpropagation algorithm. The 
results obtained suggest that Levenberg-Marquardt method outperforms all other methods 
in the ANN domain. Applying this ANN algorithm to ~ 101.44 h of Crab Nebula data 
collected by the TACTIC telescope, during Nov. 10, 2005 - Jan. 30, 2006, yields an ex- 
cess of ~ (1141±106) with a statistical significance of ~ 11.07 <r, as against an excess of 
~ (928±100) with a statistical significance of ~ 9.40cr obtained with Dynamic Supercuts 
selection methodology. The main advantage accruing from the ANN methodology is that it 
is more effective at higher energies and this has allowed us to re-determine the Crab Nebula 
energy spectrum in the energy range ~ 1-24 TeV. 
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1 Introduction 



Gamma-ray photons in the TeV energy range ( 0. 1-50 TeV ), to which we shall con- 
fine our attention here, are expected to come from a wide variety of cosmic objects 
within and outside our galaxy. Studying this radiation in detail can yield valuable 
and quite often, unique information about the unusual astrophysical environment 
characterizing these sources, as also on the intervening intergalactic space [1-3]. 
While this promise of the cosmic TeV 7-ray probe has been appreciated for quite 
long, it was the landmark development of the imaging technique and the principle 
of stereoscopic imaging, proposed by Whipple [4] and the HEGRA [5] groups, re- 
spectively, that revolutionized the field of ground-based very high-energy (VHE) 
7-ray astronomy. 

The success of VHE 7-ray astronomy, however depends critically on the efficiency 
of 7/hadron classification methods employed. Thus, in order to improve the sen- 
sitivity of ground based telescopes, the main challenge is to improve the existing 
7/hadron segregation methods to efficiently reduce the background cosmic ray con- 
tamination and at the same time also retain higher number of 7-ray events. Detailed 
Monte-Carlo simulations, pioneered by Hillas [6], show that the differences be- 
tween Cherenkov light emission from air showers initiated by 7-rays and protons 
(and other cosmic -ray nuclei) are quite pronounced, with the proton image being 
broader and longer as compared to the 7-ray image. This led to the development 
and successful usage of several image parameters in tandem, a technique referred 
to as the Supercuts/Dynamic Supercuts method. Although the efficiency of this 
7/hadron event classification methodology, has been confirmed by the detection of 
several 7-ray sources by various independent groups including us, there is a need 
to search for still more sensitive/efficient algorithms for 7/hadron segregation. The 
conventionally used Supercuts/Dynamic Supercuts method, though using several 
image parameters simultaneously, with some of them also being energy dependent, 
is still a one dimensional technique, in the sense that the parameters it uses for clas- 
sification are treated separately and the possible correlations among the parameters 
are ignored. 

The multivariate analysis methods, proposed by various groups, for discriminating 
between 7-rays and hadrons are the following: Multidimensional Analysis based 
on Bayes Decision Rules [7], Mahalonobis Distance [8], Maximum Likelihood [9], 
Singular Value Decomposition [10], Fractals and Wavelets [11,12] and Neural Net- 
works [13,14]. The comparative performance of different multivariate classification 
methods like Regression ( or Classification) trees, kernel methods, support vector 
machines, composite probabilities, linear discriminant analysis and Artificial Neu- 
ral Networks (ANN) has also been studied by using Monte Carlo simulated data 
for the MAGIC telescope. A detailed compilation of this study is reported in [15]. 
The results published in the above work indicate that while as the performance of 
Classification Trees, Kernel and Nearest-Neighbour methods are very close to each 
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other, the different ANN method employed (feed-forward, random search and mul- 
tilayer perceptron) yield results over a wide range. The feed-forward method gives 
a significance of ~ 8.75 a, whileas multilayer perceptron gives a somewhat poorer 
significance of ~ 1.22a [15]. The discrimination methods like Linear Discrimi- 
nant Analysis and Support Vector Machines are found to be inferior compared to 
others [15]. The authors of the above work claim that the Random Forest method 
outperforms the classical methodologies. 

Details regarding implementation of the Random Forest method for the MAGIC 
telescope and some of the other recent 7/ hadron separation methods developed by 
the H.E.S.S and VERITAS collaboration can be found in [16-20] 

The paper is organized in the following manner. Section 2 will cover a summary 
of some applications where ANN has been used. Salient design features of the 
TACTIC telescope and generation of simulated data bases will be presented in sec- 
tions 3 and 4, respectively. Section 5 covers the definition and statistical analysis of 
various image parameters. A short introduction to ANN methodology and a brief 
description of the ANN algorithms used in the present work have been presented in 
section 6 so that the manuscript can be followed by researchers who are not experts 
in the field of neural networks. Application of the ANN based 7/hadron methodol- 
ogy to TACTIC telescope will be presented in sections 7 and 8. These two sections 
cover the details about training, testing, validation and comparison of various ANN 
algorithms used in the present work. Application of the ANN methodology to the 
Crab Nebula and Mrk 421 data collected with the TACTIC telescope is presented 
in sections 9. A comparison between the Dynamic Supercuts and ANN analysis 
methods is described in section 10 and in section 1 1 we present our conclusions. 



2 Brief description of some applications where ANN have been used 

Research activity in the last decade or so has established that ANN based algo- 
rithms are promising alternatives to many conventional classification methods. The 
advantages of ANN over the conventionally used methods are mainly the following: 
Firstly, ANN are data driven, self- adaptive methods, since they adjust themselves 
to given data without any explicit specification of the functional form for the un- 
derlying model. Secondly, they are universal function approximators as they can 
approximate any function with an arbitrary accuracy [21]. Third and most impor- 
tant, ANN are able to estimate the posterior probability which provides the ba- 
sis of establishing classification rule and performing statistical analysis [22,23] . 
These statistical methods, though important for classification are merely based on 
bayesian decision theory in which posterior probability plays a central role. The fact 
that ANN can provide an estimate of posterior probability implicitly establishes the 
strong connection between the ANN and statistical methods. A direct comparison 
between the two, however, is not possible as ANN are non-linear and model free 
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methods, while as statistical methods are mostly linear and model based. 

Artificial neural networks have been applied quite extensively to particle physics 
experiments including separating gluon from Quark jet [24] and identification of 
the decays of the Z° boson into bb pairs [25]. Application of feed-forward ANN 
classifier, employed by the DELPHI collaboration, for separating hadronic decays 
of the Z° into c and b quark pairs has resulted in an improved determination with 
respect to the standard analysis [26]. Superior performance of the Neural Network 
approach, compared to other multivariate analysis methods including discriminant 
analysis and classification trees, has been reported by LEP/SLC [27], for tagging 
of Z° — > bb events. Details related to application of ANN to general astronomical 
applications can be found in [28]. 

Several 7-ray astronomy groups have already explored the feasibility of using ANN 
for 7/ hadron separation work. While nobody has so far worked with primary ANN 
( i.e using Cherenkov images itself as inputs to ANN), the results reported are 
mainly from the use of secondary ANN where various image parameters are used 
as inputs to the ANN. In an attempt to examine the potential of ANN for improving 
the efficiency of the imaging technique, 7-ray and proton acceptance of ~ 40 % and 
~ 0.7 %, respectively was achieved by Vaze [29] by using 8 image parameters as in- 
puts to the ANN. A detailed study of applying ANN to imaging telescope data was 
attempted by Reynolds and Fegan [14] and results of their study indicate that the 
ANN method although being superior to other methods like maximum likelihood 
and singular value decomposition does not yield better results than the Supercuts 
Method. The work reported by Chilingarian in [13] by using 8 image parameters 
as inputs to the ANN, on the other hand, indicates a slightly better performance of 
the ANN method as compared to the Supercuts procedure. Using a network config- 
uration of 4:5: 1 on the Whipple 1988-89 Crab Nebula data, the author has reported 
only marginal enhancement in the statistical significance ( viz., ~35.80(j as against 
~34.30cr obtained with the Supercuts method), but there is a significant increase in 
the number 7-ray s retained by the ANN ( viz., ~3420 as against ~2686 obtained 
with the Supercuts method). Application of Fourier transform to Cherenkov images 
and then using the resulting spatial frequency components as inputs to a Kohonen 
unsupervised neural network for classification has been reported by Lang [30]. The 
performance of Multifractal and Wavelet parameters was examined by the HEGRA 
collaboration in [31] by using a data sample from the Mkn 501 observation. The 
authors of the above work report that combining Hillas and multifractal parameters 
using a neural network yields a slight improvement in performance as compared to 
the Hillas parameters used alone. 

There are also many other assorted [32,33] and non-imaging applications includ- 
ing data collected by extensive air shower arrays where ANN have been applied. 
Bussino and Mari [34] employed a backpropagation based ANN model for sepa- 
rating electromagnetic and hadronic showers detected by an air shower array. They 
achieved a ~ 75 % identification for 7-rays and ~ 74% identification for protons. 
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Maneva et al [35] used a ANN algorithm for the CELESTE data. Dumora et al [36] 
have also reported promising results for CELESTE data where ANN method was 
used for discriminating the 7 /hadron Cherenkov events for the wavefront sampling 
telescope. The standard Sttutgart Neural Network Simulator (SNNS) package has 
also been used for 7/hadron segregation for the data obtained from AGRO-YBJ 
experiment [37]. Application of backpropagation based ANN method for separat- 
ing 7/hadron events recorded by the HEGRA air shower array has been studied by 
Westerhoff etal [38]. 

Keeping in view the encouraging results reported in the above cited literature, in 
particular the results published in [13, 15], we studied the 7/ hadron segregation po- 
tential of various ANN algorithms, by applying them to the Monte Carlo simulated 
data. The idea of applying ANN for determining the energy of the 7-rays, from a 
point source, has already been used by us [39] for determining the energy spectra 
of the Crab Nebula, Mrk421 and Mrk501, as measured by the TACTIC telescope. 



3 TACTIC Telescope 



The TACTIC (TeV Atmospheric Cherenkov Telescope with Imaging Camera) 7- 
ray telescope has been in operation at Mt. Abu ( 24.6°iV, 72.7°E, 1300m asl), a hill 
resort in Western India, for the last several years for the study of TeV gamma ray 
emissions from celestial sources. The telescope deploys a 349-pixel imaging cam- 
era, with a uniform pixel size of ~ 0.31° and a ~ 5.9° x 5.9° field-of-view, to record 
atmospheric Cherenkov events produced by an incoming cosmic-ray particle or 
a 7-ray photon. The TACTIC light-collector uses 34 front-face aluminum-coated, 
glass spherical mirrors of 60 cm diameter each with a focal length ~ 400cm. The 
point-spread function has a HWHM of ~ 0.185° (= 12.5mm) and D 90 ~ 0.34° 
(=22. 8mm). Here, D 90 is defined as the diameter of a circle, concentric with the 
centroid of the image, within which 90% of reflected rays lie. The innermost 121 
pixels (11 x 11 matrix) are used for generating the event trigger, based on a pre- 
decided trigger criterion which is either Nearest Neighbour Pairs (NNP) or Nearest 
Neighbour Non-collinear Triplets. Apart from generating the prompt trigger with a 
coincidence gate width of ~18ns, the trigger generator has a provision for produc- 
ing a chance coincidence output based on 12 C 2 combinations from various groups 
of closely spaced 12 channels. 

The data acquisition and control system of the telescope [40] is designed around a 
network of PCs running the QNX (version 4.25) real-time operating system. The 
triggered events are digitized by CAM AC based 12-bit Charge to Digital Convert- 
ers (CDC) which have a full scale range of 600 pC. The relative gain of the photo- 
multiplier tubes is monitored regularly once in 15 minutes by flashing a blue LED, 
placed at a distance of ~ 1.5m from the camera. The data acquisition and control of 
the TACTIC is handled by a network of PCs. While one PC is used to monitor the 
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scaler rates and control the high voltage of the photomultipliers (PMT), the other 
PC handles the data acquisition of the atmospheric Cherenkov events and LED 
calibration data. These two front-end PCs, referred to as the rate stabilization and 
the data acquisition nodes respectively, along with a master node form the multin- 
ode Data Acquisition and Control network of the TACTIC Imaging telescope. The 
telescope has a pointing and tracking accuracy of better than ±3 arc-minutes. The 
tracking accuracy is checked on a regular basis with so called "point runs", where 
an optical star having its declination close to that of the candidate 7-ray source is 
tracked continuously for about 5 hours. The point run calibration data (corrected 
zenith and azimuth angle of the telescope when the star image is centered) are then 
incorporated in the telescope drive system software or analysis software so that 
appropriate corrections can be applied either directly in real time or in an offline 
manner during data analysis. 

The telescope records a cosmic-ray event rate of ~ 2.0 Hz at a typical zenith angle 
of 15° and is operating at a 7-ray threshold energy of ~ 1.2 TeV. The telescope has 
a 5a sensitivity of detecting the Crab Nebula in 25 hours of observation time and 
has so far detected 7-ray emission from the Crab Nebula, Mrk 421 and Mrk 501. 
Details of the instrumentation aspects of the telescope, results obtained on various 
candidate 7-ray sources, including the energy spectra obtained from Crab Nebula, 
Mrk 421 and Mrk 501, are discussed in [41-47]. 



4 Simulation methodology and data-base generation 

We have used the CORSIKA (version 5.6211) air shower simulation code [48], 
with the Cherenkov option, for generating the simulated data-base for 7-ray and 
hadron showers. This data-base is valid for Mt. Abu observatory altitude of 1300m 
with appropriate values of 35.86 /xT and 26.6 /iT, respectively for the horizontal 
and the vertical components of the terrestrial magnetic field. The first part of sim- 
ulation work comprised generating the air showers induced by different primaries 
and recording the relevant raw Cherenkov data. Folding in the light collector char- 
acteristics and PMT detector response was performed in the second part. We have 
generated a simulated data-base of ~ 39000 7-ray showers in the energy range 
0.2-27 TeV with an impact parameter up to 250 m. These showers are generated at 

5 different zenith angles (0 = 5°, 15°, 25°, 35° and 45°). Similarly a data-base of 
about ~ 40000 proton initiated showers, in the energy range 0.4-54 TeV, within the 
field of view of ~ 6.6° x 6.6° around the pointing direction of the telescope, has also 
been generated by us. It is important to mention here that the number of gamma-ray 
showers as well as the number of proton showers have not been generated accord- 
ing to a power law distribution. However, appropriate 7-ray and proton spectra, 
with differential spectral indices of ~ -2.6 and ~ -2.7, respectively have been used 
while preparing the relevant data files used in the present work. Wavelength depen- 
dence of atmospheric absorption, spectral response of the PMT's, reflection coef- 
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ficient of mirror facets and light cones used in the imaging camera have also been 
taken into account while generating the data. The obscuration encountered due to 
the telescope mechanical structure by the incident and reflected photons, during 
their propagation, is also considered. The Cherenkov photon data-base, consisting 
of the number of photoelectrons registered by each pixel is then subjected to noise 
injection, trigger condition check and image cleaning. The resulting two dimen- 
sional 'clean' Cherenkov image of each triggered event is then used to determine 
the image parameters for shower characterization. Details of simulation aspects of 
the telescope and some of the results obtained like effective collection area, differ- 
ential and integral trigger rates are discussed in [49]. 



5 Definition and statistical analysis of Cherenkov image parameters 

5.1 Definition of Cherenkov image parameters 

A Cherenkov imaging telescope records the arrival direction of the individual Cherenkov 
photons and the appearance of the recorded image depends upon a number of fac- 
tors like the nature and the energy of the incident particle, the arrival direction and 
the impact point of the particle trajectory on the ground. The principle of detecting 
7-rays through the imaging technique is depicted in Fig. la and Fig. lb. Segregating 
the very high-energy 7-ray events from their cosmic-ray counterpart is achieved by 
exploiting the subtle differences that exist in the two dimensional Cherenkov image 
characteristics (shape, size and orientation) of the two event species. Gamma-ray 
events give rise to shower images which are preferentially oriented towards the 
source position in the image plane. Apart from being narrow and compact in shape, 
these images have a cometary shape with their light distribution skewed towards 
their source position in the image plane and become more elongated as the impact 
parameter increases. On the other hand, hadronic events give rise to images that 
are, on average, broader and longer and are randomly oriented within the field of 
view of the camera. For each image, which is essentially elliptical in shape, Hillas 
parameters [6, 50] are calculated to characterize its shape and orientation. The pa- 
rameters, as depicted in Fig.lc, are obtained using moment analysis and are defined 
as : LENGTH- The rms spread of light along the major axis of the image (a mea- 
sure of the vertical development of the shower); WIDTH - The rms spread of light 
along the minor axis of the image (a measure of the lateral development of the 
shower); DISTANCE- The distance from the centroid of the image to the centre 
of the field of view; (o;)-The angle between the major axis of the image and a line 
joining the centroid of the image to the position of the source in the focal plane; 
SIZE - Sum of all the signals recorded in the clean Cherenkov image; FRAC2- 
The degree of light concentration as determined from the ratio of the two largest 
PMT signals to sum of all signals ( also referred to as Cone). In the pioneering 
work of the Whipple Observatory [4], only one parameter (AZWIDTH) was used 
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Fig. 1. (a) The principle of detecting gamma-rays through the imaging technique (b) Forma- 
tion of Cherenkov image in the focal plane, (c) Definition of Hillas parameters character- 
izing each image and used for rejecting the cosmic-ray background. The ellipse represents 
the approximate outline of the shower image in the focal plane of the telescope. 

in selecting 7-ray events. Later, the technique was refined to Supercuts / Dynamic 
Supercuts procedure where cuts based on the WIDTH and LENGTH of the image 
as well as its orientation are used for segregating the gamma rays from the back- 
ground cosmic-rays [50] 



5.2 Statistical analysis of various parameters for selecting the optimal features 



The success of any classification technique depends on the proper selection of the 
variables which are to be used for the event segregation and the agreement be- 
tween the expected and the actual distributions of these variables. Fig. 2 shows the 
distributions of the image parameters LENGTH, WIDTH, DISTANCE and a for 
simulated protons and for the actual Cherenkov events recorded by the telescope. 
The data plotted here has been first subjected to pre-filtering cuts with SIZE > 
50 photoelectrons (pe) and (0.4° < DISTANCE < 1.4°) in order to ensure that 
the events recorded are robust and well contained in the camera. The simulated 
image parameter distribution of 7-rays has also been shown in the figure for com- 
parison. The observed image parameter distributions are found to closely match 
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Fig. 2. Comparison of image parameter distributions (a) LENGTH, (b) WIDTH, (c) DIS- 
TANCE and (d) (a) from real and the Monte Carlo simulated data for proton events. The 
simulated image parameter distribution of 7-rays has also been shown in the figure for 
comparison. 

the distributions obtained from simulations for proton-initiated showers, thus sug- 
gesting that the response of the telescope is reasonably close to that predicted by 
simulations. For converting the event SIZE, recorded in charge to digital counts, to 
corresponding number of photoelectrons, we have used a conversion factor of lpe 
=6.5 counts [42]. In order to understand and improve upon the existing 7/hadron 
segregation methods it is important to estimate the discriminating capability of each 
of the Cherenkov image parameters and their correlations [7 ] . The image parameters 
considered for this correlation study are : SIZE, LENGTH, WIDTH, DISTANCE, 
FRAC2 and a. 

In order to select image parameters which are best suited for 7/ hadron separation 
we have applied the following tests: Student's t-test, Welch's t-test, Mann Whitney 
U-test ( also known as Wilcoxon rank- sum test) and the Kolmogorov - Smirnov 
test (KS test) [51]. The Student's t-test and Welch's t-test belong to the category of 
parametric tests which assume that the data are sampled from populations that fol- 
low a Gaussian distribution. While as, the Students unpaired t- test assumes that the 
two populations have the same variances, the Welch's t-test is a modification of the 
t- test which does not assume equal variances. Tests that do not make any assump- 
tions about the population distribution are referred to as nonparametric tests. Mann 
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Whitney U-test and Kolmogorov - Smirnov test belong to this category of tests. 
While the nonparametric tests are appealing because they make fewer assumptions 
about the distribution of the data, they are less powerful than the parametric tests. 
This means that the corresponding probability values tend to be higher, making it 
harder to detect real differences as being statistically significant. When large data 
samples are considered, the difference in power is minor. Furthermore, it is worth 
mentioning here that the parametric tests are robust to deviations from Gaussian 
distributions, so long as the samples are large. 

In order to apply the above mentioned tests to simulated data of 7-ray and proton 
initiated showers we have used ~ 6000 events each, at a zenith angle of 25° and the 
results of these one-dimensional tests are summarized in Table 1 . 

Table 1 

Statistic values of various parametric and non-parametric statistical tests. Larger value of 
the statistic indicate that corresponding probability of rejecting the null hypothesis, that the 
7-ray data sample and the proton-data sample come from the same population, is low. 





Student' s 
t — test 


Welch's 
t — test 


Mann — Whitney 
U - test 


KS 
D - test 




t 


t 


z 


D 


SIZE 


1.95 


1.94 


8.66 


0.09 


LENGTH 


138.80 


138.75 


90.20 


0.85 


WIDTH 


120.96 


120.28 


84.75 


0.76 


DISTANCE 


19.65 


19.64 


17.18 


0.18 


FRAC2 


200.84 


200.94 


92.69 


0.90 


ALPHA 


112.57 


112.53 


82.89 


0.76 



Since the P-values (i.e the probability of rejecting the null hypothesis that the 7- 
ray data sample and the proton-data sample come from the same population) are 
usually very small we have instead used the value of the corresponding statistic for 
rejecting or accepting the null hypothesis. In other words t-statistic values are given 
in the Table. 1 for expressing the results of Student's t-test and Welch's t-test. Sim- 
ilarly, for Mann Whitney U test the z- statistic values are given in the table (where 
z — (U — mu)/au with mjj and a v as the mean and the standard deviation of 
U). For the Kolmogorov Smirnov test we have calculated D-statistic (i.e maximum 
vertical distance between the two cumulative frequency distributions). On exam- 
ining Table 1 it is evident that four image parameters (viz., LENGTH, WIDTH, 
FRAC2 and a) have a significant potential of providing efficient 7/ hadron separa- 
tion. Larger the value of the corresponding statistic, lower is corresponding proba- 
bility of rejecting the null hypothesis that the 7-ray data sample and the proton-data 
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sample come from the same population. 

In order to estimate the statistical relationship between two image parameters for 7- 
ray data sample and the proton-data samples separately we have also calculated the 
Pearson product-moment correlation coefficient. Following the standard procedure, 
it is obtained by dividing the covariance of the two variables by the product of their 
standard deviations. The closer the coefficient is to either -1 or 1, the stronger the 
correlation between the variables. The results of this study, obtained separately for 
7-ray and proton-data samples, are presented in Tables 2 and 3, respectively. 

Table 2 

Correlation matrix for simulated 7-ray data sample at a zenith angle of 25°. The values 
listed below for each correlation coefficient (numbers within parentheses) are the corre- 
sponding z-statistic values obtained using Fisher transformation. 

Notations used are SIZ=SIZE, LEN=LENGTH, WID=WIDTH, DIS=DISTANCE, 



FR2=FRAC2. 





SIZ 


LEN 


WID 


DIS 


FR2 


a 


SIZ 


1.000 

( — ) 


0.394 
(33.206) 


0.474 
(41.692) 


0.072 
(5.603) 


-0.441 
(38.051) 


-0.037 
(2.881) 


LEN 


0.394 
(33.206) 


1.000 

( ) 


0.615 
(60.452) 


0.038 
(2.908) 


-0.709 
(78.069) 


0.196 

(15.466) 


WID 


0.474 
(41.692) 


0.615 
(60.452) 


1.000 

( ) 


-0.396 
(33.360) 


-0.569 
(53.680) 


0.456 
(39.649) 


DIS 


0.072 
(5.603) 


0.038 
(2.908) 


-0.396 
(-33.360) 


1.000 

( ) 


-0.034 
(-2.615) 


-0.366 
(30.491) 


FR2 


-0.441 
(38.051) 


-0.709 
(78.069) 


-0.569 
(53.680) 


-0.034 
(-2.61) 


1.000 

( — ) 


-0.064 
(4.927) 


a 


-0.037 
(2.881) 


0.196 
(15.466) 


0.456 
(39.649) 


-0.366 
(30.491) 


-0.064 
(4.927) 


1.000 

( — ) 



The values of the t- statistic corresponding to each correlation coefficient are also 
given in these Tables (numbers within parentheses). These values can be used for 
assessing the significance of the correlation. Larger value of the z-statistic indicates 
that the corresponding probability of rejecting the null hypothesis that the observed 
value comes from a population in which correlation coefficient ~ 0, is low. If the 
correlation coefficient is p the Fisher transformation can be defined as: 
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Table 3 

Correlation matrix for simulated proton data sample at a zenith angle of 25°. The values 
listed below for each correlation coefficient (numbers within parentheses) are the corre- 
sponding z-statistic values (obtained using Fisher transformation). 





SIZ 


LEN 


WW 


DIS 


FR2 


a 


SIZ 


1.000 

(— ) 


0.036 
(2.757) 


0.273 
(21.950) 


-0.301 
(2.332) 


-0.083 
(6.472) 


-0.008 
(0.624) 


LEN 


0.036 
(2.757) 


1.000 

(— ) 


0.360 
(29.916) 


-0.023 
(1.792) 


-0.618 
(60.947) 


0.086 
(6.691) 


WW 


0.273 
(21.950) 


0.360 
(29.916) 


1.000 

(— ) 


-0.036 
(2.806) 


-0.510 

(46.998) 


0.005 
(0.368) 


DIS 


-0.301 
(2.332) 


-0.023 
(1.792) 


-0.036 
(2.806) 


1.000 

(— ) 


-0.006 
(0.481) 


0.012 

(0.958) 


FR2 


-0.083 
(6.472) 


-0.618 
(60.947) 


-0.510 

(45.998) 


-0.006 
(0.481) 


1.000 

(— ) 


-0.028 
(2.154) 


a 


-0.008 
(0.624) 


0.086 
(6.691) 


0.005 
(0.368) 


0.012 

(0.958) 


-0.028 
(2.154) 


1.000 

(— ) 



The Fisher p-to-z transformation [52] has also been applied to assess the signifi- 
cance of the difference between two correlation coefficients ( say p\ and p 2 ) found 
in two independent samples. The relevant expression to calculate this is given by : 



where p\ and p 2 are the two correlation coefficients, n± and n 2 are respectively the 
number of data points used while calculating p\ and p 2 . Table 4 gives the values for 
the Fisher matrix of various image parameters for the simulated 7/proton sample. 



On examining Tables 2, 3 and 4, one can select the image parameters for achieving 
optimum 7/hadron segregation. This can be done on the basis of identifying param- 
eters for which the difference between their correlation coefficients is maximum. 
As seen in Table 4, WIDTH-a pair yields the largest Fisher test value. Further- 
more, it is also encouraging to find that the other well known characteristics of 
Cherenkov image parameters are in good agreement with our results. For example, 
dependence of the image shape parameters (i.e LENGTH and WIDTH) on SIZE 
for 7-rays. Both these parameters yield positive correlation coefficient of ~0.394 




(2) 
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Table 4 

Fisher Matrix for the simulated 7/hadron data sample at a zenith angle of 25°. The matrix 
can be used to assess the significance of the difference between two correlation coefficients. 





SIZ 


LEN 


WW 


DIS 


FR2 


a 


SIZ 




20.83 


12.87 


5.6 


21.30 


1.59 


LEN 


20.83 




18.65 


3.32 


8.95 


6.16 


WW 


12.87 


18.65 




20.97 


4.56 


26.70 


DIS 


5.60 


3.32 


20.97 




1.51 


21.68 


FR2 


21.30 


8.95 


4.56 


1.51 




1.96 


a 


1.59 


6.16 


26.7 


21.68 


1.96 





and ~0.474 as shown in Table 2. Since SIZE parameter of an image provides an ap- 
proximate estimate of the 7-ray primary energy both these parameters are expected 
to be correlated with the event SIZE. The modification of the Supercuts procedure 
to Dynamic (or extended) Supercuts follows the same principle. Negative corre- 
lation between DISTANCE and a for 7-rays coming from a point source is also 
seen in Table 2 in accordance with the expected relationship between these image 
parameters. Thus, on the basis of results presented in Tables 2, 3 and 4, one can 
confidently say that there is a sufficient scope for utilizing the differences in the 
correlation between various image parameters for developing alternate 7/hadron 
segregation methodologies. 

Keeping in view the fact that, for proton initiated showers (as also in general for 
other cosmic-ray primaries), the image parameter a is expected to be independent 
of other image parameters because of the isotropic nature of the cosmic-rays we 
will not use it in the ANN-based 7/hadron segregation methodology. Justification 
for following this approach is also evident in Table 3, where for the proton data 
sample, one finds negligible correlation between a and other image parameters. 
Thus, for extracting the 7-ray signal from the cosmic -ray background, we will use 
the frequency distribution of the a parameter for the ANN selected events. The 
distribution is expected to be fiat for cosmic -rays and should reveal a peak at smaller 
a values for 7-rays coming from a point source. In all, we will use the following six 
image parameters in the ANN-based 7/hadron segregation methodology : Zenith 
angle (0), SIZE, LENGTH, WIDTH, DISTANCE and FRAC2. Use of 9 angle as 
an additional variable can be justified by keeping in view the fact that as 9 angle 
increases, the line of sight distance to the shower maximum also increases, making 
all projected dimensions of the shower (i.e, LENGTH and WIDTH) smaller. The 
shape parameters LENGTH and WIDTH are expected to approximately scale as oc 
cos{9). 
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6 ANN methodology and a brief description of algorithms used 

A neural network is a parallel distributed information processing structure con- 
sisting of processing elements (which can process a local memory and carry out 
localized information processing operations) interconnected together with unidi- 
rectional signal channels called connections. Each processing element has a single 
output connection which branches into many collateral connections as desired. All 
of the processing that goes on within each processing element must be completely 
local , i.,e it must depend upon only the current values of the input signals arriving 
at the processing element via impinging connections and upon the values stored 
in local memory of the processing elements . ANNs like humans, learn by exam- 
ple, and can be configured for a specific problem through a learning process that 
involves adjustments of the synaptic connections, called weights which exist be- 
tween neurons. A network is composed of a number of interconnected units, each 
unit having an input/output characteristics. The output of any unit is determined 
by its I/O characteristics, its interconnection to other units and external inputs. The 
feed-forward ANN is the simplest configuration and is constructed using layers 
where all nodes in a given layer are connected to all nodes in a subsequent layer. 
The network requires at least two layers, an input layer and an output layer. In addi- 
tion to this, the network can include any number of hidden layers with any number 
of hidden nodes in each layer. The signal from the input vector propagates through 
the network layer by layer till the output layer is reached. The output vector repre- 
sents the predicted output of the ANN and has a node for each variable that is being 
predicted. 

Depending upon the architecture in which the individual neurons are connected and 
the error minimization scheme adopted, there can be several possible ANN con- 
figurations. While algorithms like Standard backpropagation (along with its vari- 
ents like the backprop-momentum, Vanilla backprop, Quickprop) and the Resilient 
backpropation come under the category of Local search algorithms, Conjugate 
Gradient methods, Levenberg-Marquardt algorithm and One Step Secant belong 
to the category of Global search algorithm. Hybrid algorithm category constitutes 
models like Higher Order Neuron and Neuro Fuzzy systems. The Standard Back- 
propagation network [53], is the most thoroughly investigated ANN algorithm till 
date. Backpropagation using gradient descent however converges very slowly. The 
success of this algorithm in solving large-scale problems, although depends criti- 
cally on user-specified learning rate and momentum parameters, there are however 
no standard guidelines for choosing these parameters. The Resilient backpropaga- 
tion(RProp) algorithm was proposed by Reidmiller [54], to expedite the learning 
of a backpropagation algorithm. Unlike the standard Backpropagation algorithm, 
RProp uses only partial derivative signs to adjust weight coefficients. In the above 
backprop based gradient descent algorithms, it is difficult to obtain a unique set 
of optimal parameters, due to the existence of multiple local minima. The pres- 
ence of these local minima, hampers the search for global minimum because these 
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algorithms frequently get trapped in local minima regions and hence, incorrectly 
identify a local minimum as the global minimum. 

The conjugate scale gradient algorithms [55] initially use the gradient to compute 
a search direction and then a line search algorithm is used, to find the optimal step 
size along a line in the search direction. The Levenberg algorithm involves the 
use of "blending method" between the steepest descent method employed by the 
backpropagation/resilient algorithm and the quadratic rule employed in conjugate 
algorithms. The original Levenberg algorithm was improved further by Marquardt, 
resulting in the Lavenberg-Marquardt algorithm [56] by incorporating the informa- 
tion about the local curvature, hence forcing to move further in the direction, in 
which the gradient is smaller in order to get around the classic "error valley". More 
so, gradient descent based algorithms like backpropagation despite being popular 
among researchers are not known to be efficient algorithms due to the fact that 
the gradient vanishes at the solution. Hessian-based algorithms like the Lavenberg- 
Marquardt, on the contrary, allow the network to learn more subtle features of a 
complicated mapping. The training process converges as the solution is approached, 
because the Hessian does not vanish at the solution. The Lavenberg-Marquardt 
algorithm is basically a Hessian-based algorithm for nonlinear least square opti- 
mization [57]. One Step Secant method is an approximation of the Gauss-Newton 
method for error minimization. The advantage of this method is the smaller mem- 
ory requirement and lesser computation time, since unlike other algorithms it does 
not store the complete Hessian matrix, instead at each training iteration it assumes 
that the previous Hessian was the identity matrix. This has an added advantage that 
the new search direction can be found without having to compute the matrix inverse 
[58]. Higher Order Neuron model [59] is the one which includes the quadratic and 
higher order basis functions in addition to the linear basis functions to reduce the 
learning complexity. 

Neuro-fuzzy systems are models where ANN models are combined with Fuzzy 
systems to use the best features of both models. While as ANN's are known to be 
powerful in reaching a solution, Fuzzy systems have an advantage in comparison 
to ANN for explaining the decision rules better [60]. Apart from employing these 
methods we felt that the study would be incomplete without the use of the com- 
paratively lesser used "backprop - momentum" (backpropagation with momentum 
term added to the learning rule). The momentum term allows network to respond 
to local gradient and other trends in the error surface. Without the momentum term 
network may get stuck in some shallow within the local minima. 

It is however important to mention here that for real world problems, the above def- 
initions serve only as a guideline and the actual performance of the ANN models 
on real world problems does not necessarily follow the above theoretical predic- 
tions. Therefore, these varied algorithms under the ANN domain can not be used 
as off the shelf algorithms until sufficient expertise in the field is obtained. There 
are several other issues involved in designing and training a multilayer neural net- 
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work. These are : (a) Selecting appropriate number of hidden layers in the network; 
(b) Selecting the number of neurons to be used in each hidden layer; (c) Finding 
a globally optimal solution that avoids local minima; (d) Converging to an opti- 
mal solution in a reasonable period of time; (e) Overtraining of the network and (f) 
Validating the neural network to test for overfitting. 

While as, a lot of emphasis has been put lately on the use of Random Forest (RF) 
technique as an efficient tool for 7-hadron segregation, we believe that a properly 
selected and well trained neural net algorithm is equally as efficient for this pur- 
pose. The results obtained by [15] in their study obtained a Quality factor (QF) of 
~ 2.8 and ~ 3.0 for Random Forest and ANN methods respectively when applied 
to the MAGIC data. The maximum significance also turns out to be comparable at 
~ 8.74(7 and ~ 8.75<r for RF and ANN respectively. In another study conducted by 
Boinee et al. [61] on the MAGIC Cherenkov telescope experiment, detailed com- 
parison of RF, ANN, Support Vector Machines and Classification Trees have been 
presented. While as, the optimized RF technique resulted in a classification accu- 
racy of ~ 81.24 %, the classification accuracy for ANN turned out to be ~8 1.75% 
with a mean error rate of ~0.276 and ~ 0.256 for the Random Forest and ANN 
techniques respectively, thereby suggesting that the two techniques are at best com- 
parable. The results obtained from other methods turn out to be quite inferior com- 
pared to the ANN and Random Forest, suggesting that both the methods are equally 
suitable. 



7 Gamma/hadron separation using ANN 

7.1 Preparation of Training, testing and validation data 

Training the ANN means iteratively minimizing the error between the desired out- 
put and the ANN generated value, with respect to the network weights. Clearly, 
in order for the network to yield appropriate outputs for given inputs, the weights 
must be set to suitable values. This is done by 'training' the network on a set of 
input vectors, for which the ideal outputs (targets) are already known. For training 
the ANN we have used ~ 13750 7-ray simulated events following a power law dis- 
tribution with a differential spectral index of ~-2.6. This data-base was obtained by 
combining together ~2750 events each at 5 different zenith angles (9 = 5°, 15°, 25°, 
35° and 45°). The cosmic ray data of ~1 1290 events, used for training the ANN, is 
the actual experimental data recorded by the TACTIC telescope and was prepared 
in the following manner. Around one-third of the data used (~ 3163 events) were 
recorded in the Crab Nebula off source direction. From the Crab Nebula on-source 
data base, collected between Nov. 10, 2005 - Jan. 30, 2006, we used another (~ 
3163 events) for which a > 27° and are hence certainly cosmic-ray events. The re- 
maining one-third portion of the data was taken from ~30h of Mrk 421 off-source 
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observations and this data was collected during the same observing season. The 
zenith angle of the off-source observation was restricted to < 45°. The reason for 
generating the training data in this manner was to ensure that all possible systematic 
influences on the training of the network such as variable sky brightness in different 
directions are also included during the training procedure. Using the experimental 
data-base for the protons is a useful way of training, since it helps ANN to rec- 
ognize the latent patterns, if any, in a better way which can otherwise be difficult 
to replicate in simulations e.g, in situations when the sky brightness is higher than 
what has been assumed in simulations. The importance of using real background 
hadronic events instead of simulated events has also been demonstrated in [14]. 

The test data set consists of an independently generated sample of about 44831 
events (mixture of ~ 24603 simulated 7-ray and ~ 20228 actual cosmic -ray events), 
which has not been used while training the ANN. This data set has exactly the same 
format as the training data set and is generated in the same manner as the training 
data. A validation data sample of ~ 29798 events ( mixture of 16424 simulated 
7-ray and 13374 actual cosmic -ray events) is used for verifying that the network 
retains its ability to generalize and is not "over-trained". 



7.2 ANN training and optimizing the number of hidden layer nodes 

The network used in this work comprises 6 nodes in the input layer with one each 
node for Zenith angle (9), SIZE, LENGTH, WIDTH, DISTANCE and FRAC2 and 
one neuron in the output layer whose value decides to which class the output is 
to be categorized. This value is designated as 0. 1 or 0.9 depending upon whether 
the event in question is a gamma-ray or a cosmic-ray event respectively. In order 
to determine the optimum number of neurons in the hidden layer we evaluate the 
Mean Square Error (MSE) generated by the network. The MSE for the network is 
defined as: 



where D pi and P i are the desired and the observed values and P is the number 
of training patterns and I is the number of outputs, which happens to be 1 in our 
case. Thus MSE defined above, is the sum of the squared differences between the 
desired output and the actual output of the output neurons averaged over all the 
training exemplars [62]. The ANN algorithms used in the present work are the fol- 
lowing: Backpropagation, Resilient Backpropagation, Backprop-momentum, Con- 
jugate Gradient, One step secant, Higher Order Neurons, Levenberg Marquardt and 
the Neuro fuzzy. 




(3) 
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With regard to choosing the number of nodes in the hidden layer, it is well known 
that, while using too few nodes will starve the network of the resources that it 
needs to solve a particular problem; choosing too many nodes has the risk of poten- 
tial overfitting where the network tends to remember the training cases instead of 
generalizing the patterns. In order to find the optimum number of nodes in the hid- 
den layer we employed a two step procedure. In the first step we varied the number 
of nodes in the hidden layer from 5 to 60 (in steps of 5 up to 40 and in steps of 10 
thereafter) and noted down the MSE for each of the configurations. In the second 
step, we deliberately used significantly higher number of nodes in the hidden layer 
(equal to 90) and then applied the Singular Value Decomposition (SVD) method 
for identifying the redundant nodes [32, 63-65]. It is worth mentioning here that 
determining the optimum number of neurons in the hidden layer by sequentially 
increasing the number of nodes from 60 onwards involves massive computational 
effort, hence the need of applying the SVD method is justified. 

In the SVD method, the weight matrix (denoted by F in the present work) was 
generated by finding the output of each of the 90 nodes before subjecting them to 
the nonlinear transformation (i.e output of the hidden node). With a total of 25040 
training patterns and one hidden layer with 90 nodes, the matrix F has thus 25040 
rows and 90 columns. The SVD of the matrix F is given by F=U S V T , where U 
and V are the orthogonal matrices and S is a diagonal matrix with 25040 rows and 
90 columns. The matrix S contains the singular values of F on its diagonal. The 
dominance of the significant singular values of F ( say g out of a total p singular 
values) is found out by using the so called percentage of energy explained (P ex ) 
and is defined as : 



where Si , S 2 , S3 S p are the singular values of F arranged in their descend- 
ing order [66]. The results of this study are shown in Fig 3. where P ex is plotted 
as a function of number of nodes in the hidden layer for a representative exam- 
ple of 4 ANN algorithms. Consolidated results concerning the performance of the 
various algorithms with regard to their corresponding MSE values for the training, 
test and validation data samples are given in Table 5. The results presented in this 
table shown separately for 35 and 90 nodes in the hidden layer, can be used for 
checking whether the ANN algorithm is "over-trained" or not. When the network 
is over-trained, the MSE for the test and validation data samples are expected to be 
significantly higher than the corresponding value of MSE achieved during training. 

The optimum number of nodes for P ex ~ 99.9 % is also marked in the figures by 
full vertical lines. For P ex ~ 99.9 %, one can easily find from the this figure that 
the optimum number of nodes needed for obtaining the desired results varies be- 
tween ~22 to ~ 32. Except for the Backpropagation-Momentum algorithm which 
requires only ~5 nodes, the remaining algorithms are also found to yield optimum 
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15 30 45 60 75 90 15 30 45 60 75 90 



Number of nodes in the hidden layer Number of nodes in the hidden layer 

Fig. 3. Percentage of energy explained (P ex ) as a function of number of nodes in the hidden 
layer for some representative algorithms : (a) Resilient backpropation (b) One Step Secant 
(c) Levenberg-Marquardt algorithm and (d) Conjugate Gradient. The optimum number of 
nodes for 

Pex ~ 99.9 % is also marked in the figures by full vertical lines. 

Table 5 

MSE values of various ANN algorithms for the training, test and validation data samples. 
The two values presented in the table correspond to 35 and 90 nodes in the hidden layer. 



Algorithm 


Train 35/90 


Test 35/90 


Valid 35/90 


Backpropagation 


0.103/0.102 


0.103/0.103 


0.103/0.103 


Backpr op Momentum 


0.156/0.158 


0.157/0.159 


0.156/0.158 


ResilientBackprop 


0.035/0.033 


0.036/0.035 


0.036/0.034 


ScaleC 'on jugate 


0.047/0.040 


0.046/0.041 


0.047/0.041 


OneStepS ecant 


0.053/0.050 


0.053/0.051 


0.053/0.051 


LavenbergMarquardt 


0.017/0.015 


0.017/0.030 


0.017/0.031 


HigherOrder 


0.039/0.033 


0.040/0.033 


0.040/0.034 


NeuroFuzzy 


0.062/0.062 


0.062/0.063 


0.062/0.062 



performance with ~20 to ~30 nodes in the hidden layer. The reason for Backprop 
momentum requiring too few nodes can be understood from the manner in which 
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the algorithm is trained. In this algorithm, momentum term is added to the Back- 
prop to enhance the training time with a slight compromise on the performance of 
the network. This effect is seen in our case also where we see the Backprop Mo- 
mentum algorithm yielding the worst MSE value compared to all other algorithms. 

On examining Fig. 3 and Table 5 one can arrive at the following conclusions : (i) 
None of the ANN algorithms used in this work are under trained or over trained 
if about 35 nodes are used in the hidden layer, (ii) Increasing the number of nodes 
beyond 35 results in only a marginal reduction in the MSE. (iii) The MSE value 
yielded by the Levenberg-Marquardt method with 35 nodes is found out to be 
the lowest compared to all other ANN algorithms, (iv) Increasing the number of 
nodes from 35 to 90 leads to the problem of overfitting in the Levenberg-Marquardt 
method, (v) For the remaining algorithms no overfitting problem is seen when 90 
nodes are used in the hidden layer. The overfitting of the Levenberg-Marquardt 
(with 90 nodes in the hidden layer) is most probably related to the way in which 
the training is performed in this algorithm, more specifically how the algorithm 
accounts for error as well as the gradient information based on blending between 
the gradient descent method and the Gauss Newton rule. The Levenberg-Marquardt 
trains in such a way that large steps are taken in the direction of low curvature to 
skip past the plateaus quickly, and smaller steps are taken in the direction of high 
curvature to slowly converge to the global minima. Thus every narrow valley or 
plateau, even if as a result of noise in the data, is important for this method. Hence, 
when larger number of nodes are presented ( i.e, 637 weights for the 90 nodes 
versus 252 weights for the 35 nodes in the hidden layer), the algorithm becomes 
sensitive even to the noise values present in the data, which with lesser number of 
nodes could have been ignored. The source of noise in our training/test data-base is 
as result of inherent fluctuations in the shower development process. On the basis 
of the above argument one can thus safely use 35 nodes in the hidden layer for all 
the algorithms. 

It is worth mentioning here that the modification of the ANN structure by analyzing 
how much each node contributes to the actual output of the neural network and 
dropping the nodes which do not significantly affect the output is also referred to as 
pruning. The basic principle of pruning relies on the fact that if two hidden nodes 
give the same outputs for every input vector, then the performance of the neural 
network will not be affected by removing one of the nodes in the hidden layer. 
In the SVD approach, redundant hidden nodes cause singularities in the weight 
matrix which can be identified through inspection of its singular values. A non-zero 
number of small singular values indicates redundancy in the initial choice for the 
number of hidden layer nodes and the approach can be safely used for eliminating 
these nodes to attain the pruned network model. 

A plot of the mean square error as a function of the number of nodes in the hidden 
layer for the most popular standard backpropagation network and the Lavenberg- 
Marquardt algorithm with Sigmoid transfer function is shown in Fig. 4a. While the 



20 



0.35 



(a) 



Optimum number of nodes 



V 



Backpropagat on 



Lavenberg-Marquardt 



10 20 30 40 50 60 70 80 90 

Number of nodes in hidden layer 



10 



(b) 




0.01 



2000 4000 6000 8000 

Number of iterations 



10000 



Fig. 4. (a) Mean square error as a function of number of nodes in the hidden layer for 
the Backpropagation and the Lavenberg-Marquardt algorithms, (b) Mean square error for 
various ANN algorithms as a function of number of iterations with 35 nodes in the hidden 
layer. 

MSE at the end of the training, for 35 nodes in the hidden layer, is ~0.1032 for the 
backpropagation network, the corresponding vaule for the Lavenberg-Marguardt 
algorithm is found to be ~0.0171. Although the MSE yielded by the Lavenberg- 
Marguardt algorithm is found to be lower than the MSE values of other training 
algorithms, including the backpropagation algorithm, the reason for showing the 
MSE for the backpropagation algorithm is mainly because it has been considered 
as a "work-horse" in the field of neural computation. 

The variation of the MSE as a function of number of iterations for all ANN algo- 
rithms used, is shown in Fig. 4b. The number of neurons in the hidden layer was 
thus fixed at 35 nodes for all these algorithms. In all above algorithms, the training 
is continued till the MSE error reaches a plateau and does not decrease any further. 
About 10,000 iterations were generally found to be sufficient to train the ANN on 
various algorithms. This superior convergence of Lavenberg-Marquardt algorithm 
over the conventionally used backpropagation algorithm and/or resilient backprop 
is not totally unexpected and has been demonstrated by us on standard benchmark 
and regression problems [60]. 

It is worth mentioning here that for studying the performance of the various ANN 
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algorithms we have used BIKAS (BARC-IIT Kanpur ANN Simulator) ANN pack- 
age [67] and MATLAB [68,69] neural net packages. While as, MATLAB has been 
used for backpropagation, resilient backpropagation, Scale Conjugate, backprop- 
momentum, Lavenberg-Marquardt, and One Step Secant algorithms, the BIKAS 
package has been used for Higher Order Network and Neuro-Fuzzy models. 



7. 3 Testing and validation of Lavenberg-Marquardt ANN algorithm 



Since MSE error returned by the Lavenberg-Marquardt algorithm is lower than the 
MSE error values of other methods including the backpropagation method, we have 
used only this algorithm on the test data set for a more descriptive analysis. When 
the test data-base is presented to the network, instead of yielding the desired output 
as 0.1 or 0.9, the ANN outputs a range of values between 0.1 to 0.9. The broad 
distributions around 0.1 and 0.9, returned after testing the prior trained ANN al- 
gorithm, instead of the desired 0.1 or 0.9, is on account of the inherent shower to 
shower fluctuations on an event to event basis even though train and test data is gen- 
erated in a similar manner. The response of the network (i.e., frequency distribution 
of the selected events) for the test data sample comprising simulated 7-rays and ac- 
tual background as a function of the ANN output is shown in Fig. 5a. The results 
obtained for the validation data sample are shown in Fig. 5b. Excellent matching of 
the results obtained for the test and validation data clearly demonstrates that ANN 
has indeed "learned" and simply not remembered the classification. It is important 
to mention here that no cut on a has been applied to the data presented in these 
figures. 



8 Determination of optimum ANN cut value 



For determining the ANN output cutoff value (rj cut ), which will optimize the separa- 
tion of the two event classes (i.e 7-ray and cosmic-rays), one can maximize either 
Quality Factor (QF) or more adequately, statistical significance (N a ). Following 
their standard definitions [15], these are given by : 



QF=^BL = J^ (5) 

\JN p /N p0 y/f p 

N 

N a = ^ (6) 
yjN^ + 2N P 

where 7V 7 and N p are the number of 7-rays and hadrons, respectively, after classi- 
fication; iV 7 o and N p0 are the number of 7-rays and hadrons, respectively, before 
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ANN output ANN output 

Fig. 5. (a) ANN output of Lavenberg-Marquardt algorithm in response to simulated 7-rays 
and actual background events of the test data sample comprising a total 44831 events, (b) 
Same as (a) except for an independant validation data sample comprising a total 29798 
events. No cut on a has been applied to the data presented in these figures. 

classifier and / 7 and f p are the corresponding acceptances for 7-rays and hadrons. 
Although many groups have used QF for optimizing the performance of their clas- 
sification methods [14] , we have optimized the performance of the ANN on the 
basis of maximizing N a . The reason for this is the fact that a high value of QF 
can also result from tight cut which can reduce the 7-ray retention capability of the 
classification method. Furthermore, maximization of N a also ensures that classifi- 
cation procedure in not biased unfavorably towards higher energies. Optimization 
on the basis of maximizing N a has also been followed by other groups [13,15]. 

It is worth mentioning here that definition of statistical significance (N a ) given 
above can be only used when N 7 is known beforehand which is possible only when 
one is dealing with simulated data. Since, in case of actual data collected with 
Cherenkov imaging telescopes, jV 7 can also be calculated statistically by subtract- 
ing the expected number of background events ( e.g 27° < a < 81° used by us in 
[39] and in this work) from the 7-ray domain events (e.g a < 18° in our case) the 
definition of statistical significance given above needs to be modified. While we 
have used the above expression of N a for estimating i] cut , the significance of the 
7-ray events found in the actual Crab Nebula data has been calculated by following 
a more rigorous method of using maximum likelihood ratio of Li and Ma [70]. 
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The value of r\ cut defines the decision boundary between the two event species and 
in order to determine its optimum value we used a data sample of about 12953 
events (mixture of 8865 simulated 7-ray and 4088 actual cosmic -ray events). The 
zenith angle range of these events was again chosen to be in the range (0-45)°. 
Since the value of N a depends critically on the number of 7-rays present in the 
data we have considered only iV 7 o~177 ( i.e ~2% of the total 7-rays present in the 
data sample) for determining the optimum value of r] cut . The event is classified as 
a 7-ray like event only if the corresponding ANN output (77) is < r\ cut and a < 18°. 
The calculation was performed by varying r\ cut from 0.05 to 1 .0 in steps of 0.05 
and recording N a at each value of r] cut . The results of this study are given in Fig. 6 
which shows variation of N a as a function of r\ cut for the Levenberg-Marquardt 
based ANN algorithm. On examining this figure one can see that maximum value 




Fig. 6. Variation of statistical significance (N a ) as a function of ANN cut value ( r\ cu t) for 
the Levenberg-Marquardt algorithm. 

of N a ~ 6.8cr is obtained at i] cut ~0.475. The above data has also been used for 
evaluating the performance of other ANN algorithms and finding their optimum 
r] cut values. The results of this study are summarized in Table 6 where, in addition 
to N a values yielded by different algorithms, we also give the corresponding r/ cut 
range within which N a stays constant. The lower value of r\ cut defines the tight cut 
and higher value designates the loose cut. 

The value of N a achieved with Dynamic Supercuts is also shown in the table for 
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Table 6 

Maximum value of the statistical significance N a yielded by various ANN algorithms 
along with corresponding rj cut range with in which N a stays constant. The lower value of 
ij cu t defines the tight cut and higher value defines the loose cut. 



Algorithm 


Vent 


N a 


Backpropagation 


0.40 - 0.67 


5.21 


Backpropagation monemtum 


0.30 - 0.57 


5.33 


Resilient Backprop 


0.42 - 0.67 


5.25 


Scale Conjugate 


0.40 - 0.67 


4.80 


One Step Secant 


0.42 - 0.67 


5.25 


Lavenberg Marquardt 


0.30 - 0.62 


6.80 


Higher Order 


0.40 - 0.70 


4.80 


Neuro Fuzzy 


0.40 - 0.67 


4.47 


Dynamic Super cut 




6.09 



comparison. It is quite evident from the table that out of 8 different ANN algo- 
rithms studied here, Levenberg-Marquardt algorithm yields the best results. The 
value of N a for other algorithms is found to vary from ~ 4.5a ( Higher order net- 
work) to ~ 5.3(7 (backprop-momentum). Because of the superior performance of 
the Levenberg-Marquardt algorithm, we will only use this algorithm for analyzing 
the actual Crab Nebula data. 



Referring back to Fig. 6, since the change in N a is insignificant when r] cut is varied 
from ~0.3 to ~0.5, we will use a value of rj cut ~0.5 for analyzing the actual Crab 
Nebula data. Admittedly, using n cut ~0.5 also increases the cosmic ray background. 
The reason for choosing the higher n cut value is to ensure that we retain maximum 
number of 7-rays from the source. For sources which are weaker than the Crab 
Nebula one can use r/ cut ~0.3 so that contamination from more background can 
be reduced. Since our main preference is to observe relatively stronger sources 
such as blazars using n cut ~0.5 is an obvious choice if we want to measure their 
energy spectra beyond energies of ~10TeV. Following this approach of choosing 
the tight cuts for detecting weaker/new sources and loose cuts for obtaining the 
energy spectrum, is a well known procedure which is adopted by almost all the 
groups who work on Cherenkov imaging telescopes. 
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9 Application of the ANN methodology to the Crab Nebula data collected 
with the TACTIC telescope 

In order to study the 7/hadron segregation potential of the ANN methodology, we 
have applied this selection method to the Crab Nebula data collected with the TAC- 
TIC telescope. For this purpose we reanalyzed the Crab Nebula data for ~101.44 h 
collected during Nov. 10, 2005 - Jan. 30, 2006. The zenith angle during the obser- 
vations was <45° and the data was collected with inner 225 pixels (~ 4.5° x 4.5°) 
of the full imaging camera with the innermost 121 pixels (~ 3.4° x 3.4°) participat- 
ing in the trigger. The standard two-level image 'cleaning' procedure with picture 
and boundary thresholds of 6.5o" and 3.0a", respectively was employed to obtain the 
clean Cherenkov images. Details of this analysis procedure and the data collect- 
ing methodology for this period can be found in [42]. The purpose of this image 
cleaning procedure is to take care of the fluctuations in the image which arise due 
to electronic noise and night sky background variations. These clean Cherenkov 
images were then characterized by calculating their standard image parameters like 
LENGTH, WIDTH, DISTANCE, a, SIZE and FRAC2. Before investigating the 
7/hadron segregation potential using ANN methodology, we will first apply the 
standard Dynamic Supercuts procedure [8] to the data for extracting the 7-ray sig- 
nal from the background cosmic-ray events. 

The cut values used for the analysis are the following : 0.11° < LENGTH < 
(0.260 + 0.0265 x In S)°, 0.06° < WIDTH < (0.110 + 0.0120 x In S)°, 0.52° < 
DISTANCE < 1.27° cos - 88 9, SIZE > 450d.c ( where 6.5 digital counts=1.0 
pe ), a < 18° and FRAC2 > 0.35. It is important to emphasize here that the Dy- 
namic Supercuts 7-ray selection criteria used in the present analysis are the same 
which we had used in our previous work [39] for developing an ANN-based energy 
reconstruction procedure for the TACTIC telescope. Since the present work uses 
the same data-base as well as the same energy reconstruction procedure, we will 
consider the previous work [39] as some sort of benchmark for the present study. 
Admittedly, there may be a scope for optimizing the previously used Dynamic Su- 
percuts further (e.g by using cuts which depend on both energy and zenith angle), 
but the results of this study will be presented elsewhere. 

A well established procedure to extract the 7-ray signal from the cosmic-ray back- 
ground using a single imaging telescope is to plot the frequency distribution of 
a parameter which is expected to be flat for the isotropic background of cosmic 
events [8]. For 7-rays, coming from a point source, the distribution is expected 
to show a peak at smaller a values. Defining a < 18° as the 7-ray domain and 
27° < a < 81° as the background region, the number of 7-ray events is then calcu- 
lated by subtracting the expected number of background events (calculated on the 
basis of background region) from the 7-ray domain events. The number of 7-ray 
events obtained after applying the above cuts are found to be ~(928±100) with a 
statistical significance of ~9.40o\ The significance of the excess events has been 
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calculated by using the maximum likelihood ratio method of Li & Ma [70]. The 
^-distribution is given in Fig. 7a and the corresponding differential energy spec- 
trum of the Crab Nebula shown in Fig. 7b, has been computed using the following 
formula: 
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Fig. 7. (a) Crab Nebula a-plot for ~101.44 h of data using Dynamic Supercuts 7-ray se- 
lection criteria, (b) The corresponding differential energy spectrum of the Crab Nebula as 
measured by the TACTIC telescope. 



^(^) 5 ^ 

AE t E AijfkjTj 
j'=i 

where AiVj and d$(Ei) / 'dE are the number of events and the differential flux at 
energy Ei, measured in the ith energy bin AEi and over the zenith angle range 
of 0°-45°, respectively. Tj is the observation time in the jth zenith angle bin with 
corresponding energy-dependent effective area (Aj) an d 7-ray acceptance (r/ij). 
The 5 zenith angle bins (j=l-5) used are 0°-10°, 10°-20°, 20°-30°, 30° -40° and 
40°-50° with effective collection area and 7-ray acceptance values available at 5°, 
15°, 25°, 35° and 45°. The number of 7-ray events (AJVj) in a particular energy 
bin is calculated by subtracting the expected number of background events, from 
the 7-ray domain events. The 7-ray differential spectrum, shown in Fig. 7b, has 
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been obtained after using appropriate values of effective collection area and 7-ray 
acceptance efficiency (along with their energy and zenith angle dependence). A 
power law fit (d$/dE = foE~ r ) with f Q ~ (3.12 ±0.48) x KT 11 cm~ 2 s^TeV' 1 
and T ~ 2.69 ± 0.14 is also shown in Fig 7b. The fit has a x 2 '/dof ~ 3.64/6 with a 
corresponding probability of ~0.72. Details of the energy reconstruction procedure 
can be seen in [39] which uses 3:30:1 ANN configuration with SIZE, DISTANCE 
and Zenith angle as the inputs to the neural net. 

While applying the already trained Lavenberg-Marquardt based ANN network, 
with 6:35:1 configuration, for extracting the 7-ray signal from the data, the number 
of 7-ray events are found out to be ~(1141±106) with a statistical significance of 
~11.07o\ A value of i] cut ~0.50 has been used for selecting 7-ray events and only 
those events are allowed to go for classification with ANN, which satisfy the pre- 
filtering cuts (SIZE > 50pe and 0.4° < DISTANCE < 1.35°). The a-distribution 
of the ANN selected events is given in Fig. 8a, while as the corresponding differen- 
tial energy spectrum is shown in Fig. 8b. A power law fit (d&/dE = foE~ T ) with 
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Fig. 8. (a) Crab Nebula a-plot for ~101.44 h of data using Lavenberg-Marquardt based 
ANN network 7-ray selection criteria, (b) The corresponding differential energy spectrum 
of the Crab Nebula when ANN network is used for selecting 7-ray like events. 

fo ~ (1.16 ± 0.14) x 10- 11 cm- 2 s^TeV' 1 and 7 ~ 2.52 ± 0.12 is also shown in 
Fig 8b. The fit has a x 2 '/dof ~ 4.58/7 with a corresponding probability of ~0.71. 
Reasonably good matching of the Crab Nebula spectrum with that obtained by the 
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Whipple and HEGRA groups [71,72] reassures that the procedure followed by us 
for selecting 7-ray like events as well as obtaining the energy spectrum of a source, 
is quite reliable. 

On comparing the results of Dynamic Supercuts 7-ray selection procedure (Fig.7) 
with the Lavenberg-Marquardt based ANN network (Fig. 8) it is evident that the 
performance of the later is somewhat superior, both with regard to improving the 
statistical significance of the 7-ray signal as well as in selecting more number of 
7-rays. Although the improvement (i.e gain of ~213 gamma-ray like events along 
with signal enhancement from 9.4a to 1 1 .07 a) looks to be only modest, the main 
advantage accruing from the ANN methodology is that it is more efficient at higher 
energies which has allowed us to extend the Crab Nebula energy spectrum up to 
an energy of ~24TeV. At 7-ray energies above ~9 TeV, the Lavenberg-Marquardt 
based ANN network selects ~(85±28) events as against ~(24±9) events selected 
by the Dynamic Supercuts procedure. 

When a value of r] cut ~0.30 is used, the number of 7-ray events are found out to be 
~(680±67) with a statistical significance of ~10.49<r and this is in perfect agree- 
ment with the discussion presented in Section 8. Although the use of tight cut (i.e 
Vcut ~0.3) yields almost same statistical significance (ignoring slight degradation) 
as compared to r] cut ~0.5 cut case, the number of 7-rays retained are significantly 
less and it is just for this reason that we preferred to use a somewhat loose cut r\ cut 
-0.5. 

The performance of the Lavenberg-Marquardt based ANN network was further 
validated by applying it ~ 201.72 hours of on-source data collected on Mrk 421 
with the TACTIC telescope during Dec. 07, 2005 to Apr. 30, 2006. The total data 
used here also includes observations from Dec. 27, 2005 to Feb. 07, 2006 when 
the source was found to be in a high state by the TACTIC telescope as compared 
to the rest of the observation period [42]. When already trained ANN is used for 
extracting the 7-ray signal from the data, the number of 7-ray events are found 
out to be ~(1493± 121) with a statistical significance of ~ 12.60(7. On comparing 
these results with that obtained by using Dynamic Supercuts [42] which yields, 
~(1236±110) 7-ray events with a statistical significance of ~11.49o~, it is reas- 
suring to find that the ANN method is indeed more efficient than the Dynamic 
Supercuts method. Furthermore, as expected, no signature of a 7-ray signal is seen 
when the ANN method is applied to ~ 29.65 hours of off- source data. The re- 
sults obtained with the ANN method ( ~ 60±42 with a statistical significance of 
~1.46(j) compare well with the results reported by us earlier using Dynamic Su- 
percuts [42]( ~ 28±20 with a statistical significance of ~0.71cx). Detailed results 
of the reanalysis using the ANN including the energy spectrum Mrk-421 will be 
presented elsewhere. 

Successful detection of 7-rays from Mrk-421 thus clearly demonstrates the capabil- 
ity of the properly trained ANN to extract a 7-ray from a source other than the Crab 



29 



Nebula. It also indicates that the generalization capability of the ANN can be en- 
hanced if it is trained with the experimental data collected from different directions 
having somewhat variable sky brightness. 



10 Comparison of Dynamic Supercuts and ANN analysis methods 

A detailed study for comparing the performance of Dynamic Supercuts and ANN 
analysis methods has also been conducted by us so that the overall 7-retention 
capability of the Dynamic Supercuts and ANN analysis methods can be compared. 
One of the ways to study this is to use the Monte Carlo simulated data for 7-rays 
and plot the dependence of effective collection areas as a function of primary energy 
for the two 7-ray selection methodologies. The results of this study are shown in 
Fig. 9 where effective collection areas for the two 7-ray selection methodologies is 
plotted as a function of energy for two representative zenith angle values of 15° 
and 35°. Apart from showing the effective areas ( i.e A 1 (E)f 1 (E)) for the two 
7-ray selection methodologies, the corresponding effective area when no cuts are 
applied to the data ( i.e A 7 (E)) is also shown for comparison. The results displayed 
in the figure clearly indicate that the efficiency of Dynamic Supercuts is biased 
towards lower energies ( particularly at lower zenith angles). On the other hand, 
it is the superior performance of Lavenberg-Marquardt based ANN network ( i.e 
more collection area at higher energies) which has enabled us to retain relatively 
higher number of events at energies above ~9 TeV in the actual data as compared 
to the Dynamic Supercuts procedure. 

The above conclusion has been further validated by obtaining scatter plots of vari- 
ous image parameters and the results of this study are shown in Fig. 10. This figure 
displays scatter plots of LENGTH, WIDTH, DISTANCE and FRAC2 as a function 
SIZE for ~8358 events which have been characterized as 7-ray like by the ANN 
algorithm and have a < 18°. For comparison, the Dynamic Supercuts boundaries 
are also shown in the figure as full lines. It is quite evident from the figure that 
the ANN method in not just selecting the same population of events as the Dy- 
namic Supercuts but the ANN is also sensitive to selecting events which lie outside 
the strict Dynamic Supercuts boundaries. An alternative way to assess the resid- 
ual population of events selected by ANN is to perform a logical NOT selection 
between the ANN and the Dynamic Supercuts methods. On performing this selec- 
tion the number of 7-ray events are found out to be ~(453±74) with a statistical 
significance of ~6.27cr which again suggests that the ANN method is more useful 
than the Dynamic Supercuts methods while determining the energy spectrum of 
7-ray source. On performing a logical AND selection between the ANN and the 
Dynamic Supercuts methods the number of 7-ray events yielded are ~(655±71) 
corresponding to a statistical significance of ~9.50o\ 

In order to understand the performance of ANN for 7-rays at higher energies (i.e, 
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Fig. 9. Effective collection area as a function of the primary 7-ray energy for simulated 
7-ray at showers zenith angles of (a) 15° and (b)35°. While top most curve (labeled as 
Trigger alone) gives the effective area when no cuts are applied to the data, the remaining 
2 curves (labeled as Trigger+ DSC and Trigger+ANN) represent when Dynamic Supercuts 
and ANN analysis methods, respectively are applied to the data 

the events which eventually contribute to the last 3 energy bins of Fig. 8) Fig. 11 
displays the scatter plot of ~606 events which have been characterized as 7-ray like 
by the ANN and which have their a < 18°. In other words the data presented in this 
figure represents a subsample of the data used in Fig. 10 with an additional condition 
that the 7-ray like events should have energies above ~9 TeV. The capability of the 
ANN in selecting events which lie outside the strict Dynamic Supercuts boundaries 
is again evident from the figure. For example, presence of relatively large number 
of event outside the LENGTH cut boundary (Fig. 11a) clearly demonstrates that 
the efficiency of Dynamic Supercuts in retaining 7-rays is biased towards lower 
energies. It is important to point here that there are background cosmic -ray events 
also present in Fig. 10 and Fig. 11 which are classified as 7-ray like events by the 
event selection methodology. Since subtraction of the background events (estimated 
from 27° < a < 81° region), from the 7-ray domain (defined as a < 18°), will 
cancel out these events (in statistical sense) and it does not matter how the energy 
estimate for background event was obtained. 

Since differences in the observed energy spectrum of several active galactic nuclei, 
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Fig. 10. Scatter plots of (a) LENGTH (b) WIDTH (c) DISTANCE and (d) FRAC2 as a func- 
tion of SIZE which have been characterized as 7-ray like by the ANN and have a < 18°. 
The Dynamic Supercuts boundaries are also shown in the figure as full lines. 



especially at higher energies, can be used to study absorption effects at the source 
or in the intergalactic medium due to interaction of 7-rays with the extragalactic 
background photons [73, 74], unarguably, efficient retention of high energy 7-ray 
events is always preferable. Superior performance of the ANN at higher energies 
can thus play an important role in the understanding the absorption effects at the 
source or in the intergalactic medium. 



It is worth mentioning here that once satisfactory training of the ANN is achieved, 
the corresponding ANN generated weight-file can be easily used by an appropriate 
subroutine of the main data analysis program for selecting 7-ray like events. Use of 
a dedicated ANN software package is thus necessary only during the training of the 
ANN and is not needed there after. Also, compared to the conventional 7/hadron 
separation methods, the ANN-based procedure also offers advantages like applica- 
bility over a wider zenith angle range and implementation ease. 
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Fig. 11. Scatter plots of (a) LENGTH (b) WIDTH (c) DISTANCE and (d) FRAC2 as a 
function of SIZE which have been characterized as 7-ray like by the ANN. Apart from 
having a < 18° these events also have energy above ~9 TeV. The Dynamic Supercuts 
boundaries are also shown in the figure as full lines. 

11 Conclusions 



Atmospheric Cherenkov imaging telescopes, especially Monoscopic systems, have 
to cope up with a deluge of cosmic-ray background events and the capability to 
suppress these against the genuine 7-rays is one of the main challenges which lim- 
its the sensitivity of these telescopes. The main purpose of this paper is to study 
the 7 / hadron segregation potential of various ANN algorithms for the TACTIC 
telescope, by applying them to the Monte Carlo simulated and the observation 
data on the Crab Nebula. The results of our study indicate that the performance of 
Levenberg-Marquardt based ANN algorithm is somewhat superior to the Dynamic 
Supercuts procedure especially beyond 7-ray energies of > 9 TeV. Since for real 
world problems it is not an easy task to identify the most suitable ANN algorithm 
by just having a look at the problem, our results suggest that while investigating 
the comparative performance of other ANN algorithm, the Levenberg-Marquardt 
algorithm deserves a serious consideration. The main advantage of using the ANN 
methodology for 7/ hadron segregation work is that it is more efficient in retaining 
higher energy 7-ray events and this has allowed us to extend the TACTIC observed 
energy spectrum of the Crab Nebula up to an energy of ~24TeV. Reasonably good 
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matching of the Crab Nebula spectrum as measured by the TACTIC telescope with 
that obtained by the other groups reassures that the ANN-based 7/hadron segrega- 
tion method and also the procedure for obtaining the energy spectrum of a 7-ray 
source are quite reliable. 
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